Automatic Generation of Syntactically Well-formed and Semantically Appropriate Paraphrases

نویسنده

  • Atsushi Fujita
چکیده

Paraphrases of an expression are alternative linguistic expressions conveying the same information as the original. Technology for handling paraphrases has been attracting increasing attention due to its potential in a wide range of natural language processing applications; e.g., machine translation, information retrieval, question answering, summarization, authoring and revision support, and reading assistance. In this thesis, we focus on lexical and structural paraphrases in Japanese, such as lexical and phrasal replacement, verb alternation, and topicalization, which can be generated relying on linguistic knowledge only. First, we address how to generate well-formed and appropriate paraphrases. One of the major problems is that it is practically impossible to take into account all sorts of semantic and discourse-related factors which affect the well-formedness and appropriateness of paraphrases. The knowledge, such as transformation rules, used for paraphrase generation tends to be underspecified, and thus would produce erroneous output. The revision process is introduced to detect and correct ill-formed and inappropriate candidates generated in the transfer stage. Within this framework, we first investigate what types of errors tend to occur in lexical and structural paraphrasing, and confirm the feasibility of our transfer-and-revision framework by revealing that most errors occur irrespective of classes of transformation rules. On the basis of another observation; that errors associated with case assignments form one of the major error types, we develop a model for detecting this type of error. The model utilizes a large collection of positive examples and a small collection of negative ones by combining supervised and unsupervised machine learning methods. Experimental results indicate that our model significantly outperforms conventional models. The second issue is to develop a mechanism that is capable of covering a wide variety of paraphrases. One way of gaining the coverage of paraphrase generation is to exploit the systemicity underlying several classes of paraphrases, such as verb alternation and compound noun decomposition. To capture the semantic properties required for generating these classes of paraphrases, we utilize the Lexical Conceptual Structure (LCS). The framework represents verbs as semantic structures with focus of statement and relationships between semantic arguments and syntactic cases. We implement a paraphrase generation model which consists of a case assignment rule and a handful of LCS transformation rules, with particular focus on ∗Doctoral Dissertation, Department of Information Processing, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-DD0261023, March 2005.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mutaphrase: Paraphrasing with FrameNet

We describe a preliminary version of Mutaphrase, a system that generates paraphrases of semantically labeled input sentences using the semantics and syntax encoded in FrameNet, a freely available lexicosemantic database. The algorithm generates a large number of paraphrases with a wide range of syntactic and semantic distances from the input. For example, given the input “I like eating cheese”,...

متن کامل

Automatic generation of large-scale paraphrases

Research on paraphrase has mostly focussed on lexical or syntactic variation within individual sentences. Our concern is with larger-scale paraphrases, from multiple sentences or paragraphs to entire documents. In this paper we address the problem of generating paraphrases of large chunks of texts. We ground our discussion through a worked example of extending an existing NLG system to accept a...

متن کامل

Paraphrase and Textual Entailment Generation

One particular information can be conveyed by many different sentences. This variety concerns the choice of vocabulary and style as well as the level of detail (from laconism or succinctness to total verbosity). Although verbosity in written texts is considered bad style, generated verbosity can help natural language processing (NLP) systems to fill in the implicit knowledge. The paper presents...

متن کامل

Automatic Paraphrasing of Japanese Functional Expressions Using a Hierarchically Organized Dictionary

Automatic paraphrasing is a transformation of expressions into semantically equivalent expressions within one language. For generating a wider variety of phrasal paraphrases in Japanese, it is necessary to paraphrase functional expressions as well as content expressions. We propose a method of paraphrasing of Japanese functional expressions using a dictionary with two hierarchies: a morphologic...

متن کامل

Automatically Constructing a Corpus of Sentential Paraphrases

An obstacle to research in automatic paraphrase identification and generation is the lack of large-scale, publiclyavailable labeled corpora of sentential paraphrases. This paper describes the creation of the recently-released Microsoft Research Paraphrase Corpus, which contains 5801 sentence pairs, each hand-labeled with a binary judgment as to whether the pair constitutes a paraphrase. The cor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005